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, ABSTRACT 



The. properties of an approximation to the standard error of measurement . 
were 'described and illustrated with hypothetical data. It was concluded 
that the approximation is a systematic overestimate of the standard 
error of Measurement computed in the usual wa;^ with Kuder-Richardson 
formula 20. The relative error of the approximafcion was small for 
what was thought to represent many longer tests. However, for short,' 
internally consistent tests of the type used in instructional 
pxograins, the relative error can be quite large. * 
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Properties tof ar ^toposed Apprpxl^acion ^ * 
« ♦ / to the Standard-; Eripor of Measurement 

-The purpose «f this paper^'is; to examine sopie* of th^ properties of an 

approximation' formula for t-fie st;andard*etror of measurement tjiat was 

recently .ifroposed by 'Garvin '*(1976) . Examining, the properties of phis 

I ' ^ 
approximation would seeiir^to be necessary^ tecause it has been recotmnended 

for use with *cla3s room t^sts solely on. the basis of its "computational 

simplicity* Jurther, the empirical examples used to , illustrate Uts use 

W^te nqt complete enough to judge the usefulness of ^ the approximation .for ^ 

-* * 
4 wide range of classroom tests. 'TWbse using the proppsed approxintation ' 

may not be aware of its propertied and recommendations for us in g;^ it 

may weXl be tempered by -a discidtS^Ion'.af theto. ' ^ 

• ^ The Proposed Approximation 

^ The proposal is to appncwcimate the- st-andard error of measurement .(SEM) 
by the following fprmula (GarVin, 197d, pf 102): 



• . . 3^.,, XI) 

where .N » the number of 'examinees taking the test 

and ^ ' ■ T » the -n^umber of ^xaminees answering a given 

' item correctly. * 
The approximation is in^tended to apply to tests of k items, each ,of which 
iff ^ored zero or one. . * ' ' - 

Formula (1) is derived .by substituting N ifor N7I and k for,k;-l in 
" the formula: * » • ' . ^ , ; 
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where ^ ' ~ N - J. 



SEM =» <j/l - KR20 , (2) 



2 ^ g(X - , (3) 



(4) 



r 

and 



Z(X - X) (5) 

N . 



The symbols in these formulas have their usual meanings." 

\ 

j ' Some Properties of SEM 

It should be noted that formula (2) is appropriate under certain 
conditions. One of these conditions is that 'KR20 is e^^ual to , the 
reliability of the test in question. The necessary and sufficient . 
conditions_ dnder which this is true are called essential t^u-'equi valence 
(Novick & Lewis, 1967). If the true scores of the items of a test are 
not at least essentially tau-equivalent, KR20 will underestimate the 
tes4 reliability, as defined in the classical sense, /and the 
standard error of measurement will ^be overestimated. Additional problems 
exist: a generally is not ^an unbiased estimate of the {Population standard 
deviation and KR20 is a biased estimate of its corresponding population 
value (Kristof, 1963). However, for many commercially available tests the 
standard error of measurement is determined using KR20. For classroom 

- J ' ■ ■ 

tests, yost introductory toting and measurement texts express SEM in terms 
of S rather than a* This distinction will make a difference, as will be 
discussed below. 



^Proper ties' . 



f 

Although it i^ obvioufi, it -should *be 'stated that 

. SEM" = /Zpq ' ^ ' (6) 



'where Zpa is the; sum of the k item variances. If SEM^ is to be recounn^nded, 
then an explanation of the relationship of the sum of the item variances 
to the total t/ast error variance would seem to be in order, 

Kuder--Richardson formula 20 in its general form is known as clpfficient 
alpha (Cronbach, 1951). Using the notation of coefficient alpha and under 
the assximptions of at least essential tau-equi valence, it can be shown that 



where f a.^ the observed score variance item of j» 



» dhe true score vari^ance of the k~item test, 

and Oy^^ th'e error score variance of the k-item test. 

* ^x 

While it is true that tests composed of i-tene s'cored zero or one 
violate the assumpxionsj^ under which equation (7) was derived (see, for example, 
Feldt, 1965), this expression would seem to hold well enough when Zpq is 
substit;uted for Sa.^ to conclude that the square of SEM^ estimates something 

• • J . - , \ 

more' than the error variance of the testi If expression (7) is true, then 
Attempting to estimate the" error variance via (SEM')^ could be 
serious error. ^ ^ « 



Classroom tests that .would be used, say, to assess competency* over 
• « 
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small instructional units, would be relatively short and possibly quite 

internally consistent. Such short -tes'ts seem to be used quite frequently 

in the classroom. In such cases, ^the fraction a„^/k is likely to be high 

relative to a„^. For example, Hsu (1971) reports data for foui^item' 
^x , . 

tests that measure attainment of single instructional objectives. Some 
of the KR20-values he reported'were higher than '•90, One test had 
KR20 equal to «97 (N =» S =» 1.91). Mn this case the value of 
SEM" is three t;imes that of SEM (SEM" ,997, ,SEM = •331), 

'To study how 3EM^ differs systematically from. SEM we nee'd to express 
them in comparable terms. Manipulating formula (4) gives the following 
result; 



(SEMO^ 



1 - 



k-1 



KR20 



(8> 



Garvin' chose^ to express SEM in terms bf a instead of S. Since textbooks 
typically use S, both casa^ are^^jtamined below,- ^ 
If SEM is expressed in terms of S, then it follows that 

S^KR20 . 



(SEMO^ - (SEM)^"^ 



(9) 



SEM" - SEM 



K?^0.- /1-KR20 



and 



SEM' 5 SEM 



<(10) 

ai) 



When the observed score ^riance of the test is computed S for ' 
both KR20 and SEM, ihe approximation SEM' is an overestimate of SEM except 
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when KR^O = 0. For fixed test length k, the difference in the bjjackets ^ 
of equation (10) is a monotonically increasing function of KR20. It 
.inbreases rapidly at higher values of KR20 and gives a J-shaped api^earance 
when graphed. When KR20 equals one, St^'^ is equal to S/viT, whereas, SEH 
equals zero. m ' ' ' ' ^ 

If SEM^is expressed in terms of a and if KR20 is expressed in terms of 
S, then expression (10)* becomes \ 



SEHC - SEM « S 



Jl.[jSzl_jLo-^(^)(l-KR2o) 



•(12) 



In this case the bracketed difference is also a monotonically increasing, 
I J-shaped function of KR20 for fixed test length k. However, the following 
relationships hold. ' * ^ 



SEM"^> SEM, when 



• k 



(N-1) -KTc 



<f KR20 < r, (13) 



SEM" =» SEM,^when KR20 * 



and 



SEM" < SEM, when 0 < KR20 < 



(N-1) + k ^ 
. k 



(N-1) + k- 



(14) 
(15) 



-Alternately, we can write tha 



1 




£, (SEM' - SEM)i 



(16) 



when 0 i KR20 i 1. 
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A Relat ionship of' SEM^ to Lord's Formulation * ^ . ' 

^ — ^ ] 

The values obtained for SEM^ in Garvin's article were contrasted to , 
^ f 

J^Qjcd^^a (1957) formulation of the s'tandard error B^f measurement for 
•individuals at a- specific score point. Lord's formulation assumes that . , 

* the \c items of the* tept are a random sample from a very large domain 
of items. Under the conditions specified in Lord^'s development, the 

• estimated error- vatiance for individuals attaining a number right score 
of is 

.y;' ; 2 , - H) [ (17) 

Since SEM'' ±3 intended to approximate SEM, the value of comparing 

SEM^ to 0^ should be questioned. One way to interpret (SEM)^ is as tKe 
H 

average of all examinees' individual error score variances. If all in- 

dividuals are measured with fequal accuracy, then XSEM)^ w|.ll apply equally 

wel]/ to each score-level; otherwise, it will not. Since a- reflects the ^ 'i 
* * Ei 0 

idea that all' persons are not measured equally well,' it may be more useful 
to teachers than either BEM^ or SEM. \ 

However, if one is to compare SEM* with sfeM^'then to b^^^^con^i^tent , 



one should compare SEM^ with an est/imate based on the average of j:he 

o„ -values over all persons tested. Lord (1955) has shown that this average 

Ei ^ ' 

'is * . ^ 

. SEMj^ =■ S/1 - KR21 , ^ (18) , 
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8 ^ 

s ^ . ' ^ . . 

where SEM^* « the e*stimated, average standard error 

of .measurement based on Lord's formulation, 



and KR21 



The compairLsons that ^are of Interest are 



(SEM')2 - (SEM^)2 - S2 jmi - ^ J KR2oj<20) 

and ' . (SEM,)2,- (SEM>.2 =- (KR20 - KR21) . ' (21) ' 

If all of the test items 'have the same difficulty^ value , then KR20'^ * s 
is equal to KR21 and . - . , \ 

(SEM')2 - EiiLJLA) , » • (22) 

^• ' 

Under these spec±el con4itions SEM^^ is iden^pal to SEM; otherwise, ^-SEMj^ 

will be larger than SEM* ^The value of SEM' however, will still mainCiain 

thp relationships to SEM that ate. described by 'the equations in the preceding 

section. ^ ' ^ * . ^ , 

Tucker (1949) has shown thit, in general, KR20 is larger than KR21 

by an amount equal to - - , • 

"v 'P tc^sg ;. • , (>3) - / 

. ■ . • . ^ (k-l)S^ 

where S^ is the variance of tKe item difficulties of the te^t* This means 

P ' 

that the difference expressed in ec^uation (20) is a function of the 'item 

» ' • . * . * ' . . ' '\ 

difficulties' of the test. We can express, this difference as , ' - j 
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Similarly,' we- can 'rewrite equation (21) as - 

' (SEMj^)2.- (SEM)2 = -1^/. (25) 



, By applying Tud*<^*3 (1949, formula 26) result along with equations 
'06)' and (18) , it can be shown that for k greater thqn one, 

' ^ 7. 

S^''<S^Vwhen^>^-,.; ^^2^> 

•# , 
(27) 





7. 


KR20 


k 


.KR21 • 


k-1 " 


KR20 


k 


KR21 . 


k-1 


KR20 . 


k 


mi 


k-1 



^ Taking into account equations (11) and- (25) through' (28),* we can 
state the 'following relationships^^among the"* three; estimators of the 
standard error of measurement:: , • 

SEM^ > SEM"> SEM. if condition (26) holds and 



\ , . • ' - if KR2«^ > KR21; (29). 

SEM^ «.SEM"> SEM, if condition (2:^) holds and ^ 

. if KR20 > mi; ' - (30) / 

. and . ' . ^ * S£M" SEM^> SEM, if Condition X2a) holds- and 

' * * , if KR20 > KR21. ~Ol) 

All three ^Expressions are equal to S ^ when KR20 =* KR21 = 0. When 

KR20 « KR21 0, tl^en SEl^^is equal .to SEM, but SE^^" is still grater 
than SEM as shown by equation (10) • * , 
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Representative Values of the Indices 

■ • ■ • 

t Saupe (1961) has provided some representative value's o^ test 

/' - • - / ' ' ■ ' ■ • 

statistics for three general types of tesM. Table 1 is based op Saupe's 
values^and serves to illust^rate the' algebraic* results obtained above. 
It should be noted that in making the calculations for Table 1, the 
- 'values for KR2(^ and KR21 were carried to more decii^tal places than Saupe 



presented. Also, Table ^ uses ^ expression (5) for the test variance for 
all computations. • ^ 



Insert Table 1 about here 



Two points may be noted. from tftis table.* Firsts as the average . 

' ^ i. 

•-<"item difficulty level approaches .50 and as the variance of the 
._item difficulties ^ppjyjaches -zero, the discrepancies between all of the 
^ indices become smaller. Secondly, SEM^ tends to be closer to SEM than 
« ' SEM^ is, when the variance of the* item difficulties is le&s than .02, 
- regardless of test lengths , ' 

- Oae^ would guess that most achievemen'^ tests would have distributions 

of item difficulties with value^i- raagiug between .20 and .80. A uniform 
distributioii of item difficulties over this range would have S^ = .03. . ^ 
-A symetr^c, somewhat platykurtic distribution over 'the range .25 to .75, / 
might be more typical o^ achievement tests designed to survey byroad ra^g^s 
of achievement in a subject. Such a distribution would likely hWe S^ 
equal to about .01. If one were to ^concentrate item difficulties over ^ , , ' 
^ a, narrow range, say, .45 to .60, theti a uniform distribution over this 
« .. " — ^ 
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range vould have less than '.01. It is in the latte^r two cases that 
P 

is smallest and SEIL is© closer to the value of SEM. 

p . * ^ ' ' . ^ : " 

It should be noted, however, that the relative error of SEM^ is 
generally small for the valued shown in Table 1, ranging' from 8.8% > 

(A, « .280) to 2.2% (A, - .048). the relative..^ri:or for SEM, is 

1 - 1 . ^ L , ^ ~ j 

generally more substantial for these values, and- ranges from 38.7% 
XA^*" .555) to' 0%. If it Vs -true th^t most educational acKievement 
tests would have < .01, then^Table'l would indicate that the 
relative error of SEM^ is small, being between 2%' and 5%, when the tes't 
length is 20 items or more* The relative error fqr SEM^ is also small 
for these values of and»test length, ranging between 0% and 3%. 

SuTTimary ^ 

Recently, SEM^ [hs^ defined by formula (1)}' was proposed as a ' 
computationally si;nple,appi;oximation to the standard error of measurement 
(SEM) for a test when this index is defined as in formula (2)^ Several 
properties of SEM^ were identified: 

Ij, The index" SEM^ can be shown to be systematically related ^o 



the t'lHie scor^ variance of *the test [formula (7)]. This means 
that for short, very reliable tests, the relative'' erfbr in SEM^ can be 
quite high. . ^ • * 

2. ?ar the same data, SEM'^ is always larger than SEM when }CR20' > 0 
and when the test's stajtidard deviation is computed in the same way for both 
SEM and for iCR20. When SEM is defined as in formula (2) and when KR20 
id defined as in fonfiula (4), then ^BM^ can underestimate SEM for the 
same data. 
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3. 'It is felt that tlie conq>ari3on of .SEM^ to Lord's cj- was 
inappropriate, since SEM^ attepipts to approximate the average examinee's 

■ . ^ ' - ^ •■ ' - 

error-score staixdard deviation, while a- , does not. The "appropriate" 
comparison would be to SEM^ as defined in formula' (18). 

4. The , relationship between SEM, SEM^, and SEM^^epends oh the ^ 
variance of the item difficult indices, or, alternatively, on the ratio 

of KR20 to KR21. These relationships are^described by ir^equalities- (26) 
through (31). 

5. If it is true that most educational achieven^nt tests have 

< .01, then the relative errors of both SEM^ and SEM^ tn' approximating 

SEM seem to be q\iite small when the number of items is over 20. The 

relative error of SEM^^ is somewhat smaller than the relative error of 

SEM^^ior this range of S^-values* however. 

P 

Whether the information above, argues for or against recommending the 
use of SEM^ for classroom- tes'ts depends on whether one is inclined to 
recommend computationally easier "^formulas that are known to be -systematically 
biased and that seem to lack conceptual relationships to the qualitieis of the 
tests which they se^k t'o estimate. If so, then SEM^ has merit, at least 
for longer tes^s with equal item difficulties. 
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